A fast dendrogram refinement approach for unsupervised expansion of hierarchies
نویسندگان
چکیده
Hierarchies are effective data models for organizing textual collections, particularly for automatic document classification into categories and subcategories. However, the majority of existing methods on hierarchical classification require human-labeled document set. Moreover, humans have good insight to manage the categories of higher levels of the hierarchy, i.e., more general categories, while the management of more specific categories is a difficult and expensive task since it requires expert knowledge to identify appropriate categories and their respective documents. Thus, in this paper we introduce an approach to automatically expand new, and more specific categories from a reduced initial hierarchy, which contains only general categories. Our approach is based on text clustering methods, particularly performing refinements on dendrograms obtained by hierarchical clustering algorithms. The results of the experimental evaluation show that the proposed approach achieves better performance in the expansion of hierarchies, compared with a traditional technique. Moreover, our approach is computationally faster, allowing the identification of new categories in large text collections.
منابع مشابه
Fast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies
Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...
متن کاملA Fast and Accurate Expansion-Iterative Method for Solving Second Kind Volterra Integral Equations
This article proposes a fast and accurate expansion-iterative method for solving second kind linear Volterra integral equations. The method is based on a special representation of vector forms of triangular functions (TFs) and their operational matrix of integration. By using this approach, solving the integral equation reduces to solve a recurrence relation. The approximate solution of integra...
متن کاملSelf-organized Reservoirs and Their Hierarchies
We investigate how unsupervised training of recurrent neural networks (RNNs) and their deep hierarchies can benefit a supervised task like temporal pattern detection. The RNNs are fully and fast trained by unsupervised algorithms and only supervised feed-forward readouts are used. The unsupervised RNNs are shown to perform better in a rigorous comparison against state-of-art random reservoir ne...
متن کاملAn information theoretic approach to hierarchical clustering combination
In Hierarchical Clustering, a set of patterns are partitioned into a sequence of groups represented as a dendrogram. The dendrogram is a tree representation where each node is associated with merging of two (or more) partitions and hence each partition is nested into the next partition. Hierarchical representation has properties that are useful for visualization and interpretation of clustering...
متن کاملFast Finite Element Method Using Multi-Step Mesh Process
This paper introduces a new method for accelerating current sluggish FEM and improving memory demand in FEM problems with high node resolution or bulky structures. Like most of the numerical methods, FEM results to a matrix equation which normally has huge dimension. Breaking the main matrix equation into several smaller size matrices, the solving procedure can be accelerated. For implementing ...
متن کامل